DBM-Tree: A Dynamic Metric Access Method Sensitive to Local Density Data

نویسندگان

  • Marcos R. Vieira
  • Caetano Traina
  • Fabio Jun Takada Chino
  • Agma J. M. Traina
چکیده

Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as the range and the k-nearest neighbor queries. Current methods improve the query performance minimizing the number of disk accesses, keeping a constant height of the structures stored on disks (height-balanced trees). The Slim-tree and the M-tree are the most efficient dynamic MAM so far. However, the overlapping between their nodes has a very high influence on their performance. This paper presents a new dynamic MAM called the DBM-tree (DensityBased Metric tree), which can minimize the overlap between high-density nodes by relaxing the height-balancing of the structure. Thus, the height of the tree is larger in denser regions, in order to keep a tradeoff between breadth-searching and depth-searching. Moreover, an optimization algorithm called Shrink is also presented, which improves the performance of an already built DBM-tree by reorganizing the elements among their nodes. Experiments performed over both synthetic and real datasets showed that the DBM-tree is, in average, 50% faster than traditional MAM and reduces the number of distance calculations by up to 72% and disk accesses by up to 54%. After performing the Shrink algorithm, the performance improves up to 30% regarding the number of disk accesses for range and k-nearest neighbor queries. In addition, the DBM-tree scales up well, exhibiting sub-linear performance with growing number of elements in the database.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Revisiting the DBM-Tree

In [Vieira et al. 2004] we presented a new dynamic Metric Access Method (MAM) called DBM-tree. This structure, unlike any other MAM, explores the varying density of elements in the dataset that allows creating, in a controlled way, unbalanced trees. Every dynamic MAM that works with persistent data proposed so far uses the same principle employed in conventional trees, like the B-tree [Comer 19...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

Nearest Neighbours Search Using the PM-Tree

We introduce a method of searching the k nearest neighbours (k-NN) using PM-tree. The PM-tree is a metric access method for similarity search in large multimedia databases. As an extension of M-tree, the structure of PM-tree exploits local dynamic pivots (like M-tree does it) as well as global static pivots (used by LAESA-like methods). While in M-tree a metric region is represented by a hyper-...

متن کامل

Improving the Pruning Ability of Dynamic Metric Access Methods with Local Additional Pivots and Anticipation of Information

Metric Access Methods (MAMs) have been proved to allow performing similarity queries over complex data more efficiently than other access methods. They can be considered dynamic or static depending on the pivot type used in their construction. Global pivots tend to compromise the dynamicity of MAMs, as eventual pivot-related updates must be propagated through the entire structure, while local p...

متن کامل

Bulk-loading Dynamic Metric Access Methods

The main contribution of this paper is a bulk-loading algorithm for multi-way dynamic metric access methods based on the covering radius of a representative, like the Slim-tree. The proposed algorithm is sample-based, and it builds a height-balanced tree in a top-down fashion, using the metric domain’s distance function and a bound limit to group and determine the number of elements in each par...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JIDM

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2004